JOURNAL OF VIROLOGY, Jan. 2003, p. 1415-1426 
0022-538X/03/$08.00+0 DOT: 10.1128/JVI.77.2.1415-1426.2003 


Vol. 77, No. 2 


Copyright © 2003, American Society for Microbiology. All Rights Reserved. 


The 3C-Like Proteinase of an Invertebrate Nidovirus Links 
Coronavirus and Potyvirus Homologs 


John Ziebuhr,'* Sonja Bayer,’ Jeff A. Cowley,” and Alexander E. Gorbalenya* 


Institute of Virology and Immunology, University of Wiirzburg, Wiirzburg, Germany'; Cooperative 
Research Center for Aquaculture, CSIRO Livestock Industries, Long Pocket Laboratories, 
Indooroopilly, Australia*; and Department of Medical Microbiology, Center of Infectious 
Diseases, Leiden University Medical Center, Leiden, The Netherlands* 


Received 27 June 2002/Accepted 15 October 2002 


Gill-associated virus (GAV), a positive-stranded RNA virus of prawns, is the prototype of newly recognized 
taxa (genus Okavirus, family Roniviridae) within the order Nidovirales. In this study, a putative GAV cysteine 
proteinase (3C-like proteinase [3CL’"°]), which is predicted to be the key enzyme involved in processing of the 
GAV replicase polyprotein precursors, ppla and pplab, was characterized. Comparative sequence analysis 
indicated that, like its coronavirus homologs, 3CL?*® has a three-domain organization and is flanked by 
hydrophobic domains. The putative 3CL?’ domain including flanking regions (ppla residues 2793 to 3143) 
was fused to the Escherichia coli maltose-binding protein (MBP) and, when expressed in E. coli, was found to 
possess N-terminal autoprocessing activity that was not dependent on the presence of the 3CL?"® C-terminal 
domain. N-terminal sequence analysis of the processed protein revealed that cleavage occurred at the location 
8277 VTHE | VRTGN7***. The trans-processing activity of the purified recombinant 3CL”’° (ppla residues 
2832 to 3126) was used to identify another cleavage site, °*'KVNHE | LYHVA®®°, in the C-terminal pplab 
region. Taken together, the data tentatively identify VxHE | (L,V) as the substrate consensus sequence for the 
GAV 3CL?"®. The study revealed that the GAV and potyvirus 3CL”"’s possess similar substrate specificities 
which correlate with structural similarities in their respective substrate-binding sites, identified in sequence 
comparisons. Analysis of the proteolytic activities of MBP-3CL?™ fusion proteins carrying replacements of 
putative active-site residues provided evidence that, in contrast to most other 3C/3CL?"’s but in common with 
coronavirus 3CL?'°s, the GAV 3CL?®° employs a Cys??°*-His”®”? catalytic dyad. The properties of the GAV 
3CL?” define a novel RNA virus proteinase variant that bridges the gap between the distantly related 


chymotrypsin-like cysteine proteinases of coronaviruses and potyviruses. 


Gill-associated virus (GAV) is an enveloped, rod-shaped, 
positive-stranded RNA virus that infects Penaeus monodon 
(black tiger) prawns in Australia (8, 41). While subclinical 
GAV infections, originally reported as lymphoid organ virus, 
are highly prevalent in both wild and farmed P. monodon (41), 
acute infections causing mortality have also been reported 
(40). GAV is closely related morphologically and genetically to 
yellow head virus (7, 10, 41), which is associated with yellow 
head disease and has caused considerable production losses in 
P. monodon farmed throughout southeast Asia (5). 

GAV and yellow head virus have recently been placed in a 
new genus, Okavirus, within a new family, Roniviridae (8, 11), 
that, together with the Coronaviridae and Arteriviridae, forms 
the order Nidovirales (6, 12). The phylogenetic relationship 
between GAV and nidoviruses became evident from compar- 
ative sequence analyses of the 20-kb 5’-terminal region of the 
GAV genome (8), which revealed striking similarities in the 
organization and expression of the viral replicase genes. In 
common with nidoviruses, the 5’-terminal replicase gene of 
GAV encodes two large open reading frames, ORFla and 
ORF 1b, comprising 12,248 and 7,941 nucleotides, respectively. 
In vitro data also demonstrated that the downstream ORF1b, 
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which overlaps ORFla by 99 nucleotides, is expressed by ribo- 
somal frameshifting, as in all nidoviruses. Most probably, slippage 
into the —1 frame occurs at the sequence '7?'-AAAUUUU "7?! 
and involves an RNA pseudoknot located immediately down- 
stream of this slippery sequence (8). Accordingly, ORFs la and 
1b are translated as two polyproteins, ppla (460 kDa) and its 
C-terminally extended form, pplab (758 kDa), which are ex- 
pected to mediate the functions required for genome replica- 
tion and transcription of a 3’-coterminal nested set of sub- 
genomic mRNAs encoding the viral structural proteins (9). 

Comparative sequence analysis revealed several putative 
functional domains in the GAV polyproteins, including heli- 
case and polymerase motifs, ordered similarly to the cognate 
domains in the viral polyproteins of other nidoviruses (8). This 
observation, combined with the fact that the GAV polymerase 
domain contains the SDD motif unique to nidovirus poly- 
merases, strongly suggested that GAV (infecting invertebrates) 
and nidoviruses (infecting vertebrates) have a common ances- 
tor (14). However, the presence of a number of regions with 
low sequence similarity in ORF1b and, in particular, the 
extremely poor ppla conservation suggested that GAV 
has diverged significantly from the vertebrate nidoviruses 
(corona- and arteriviruses). Indeed, the only region in ppla 
with significant sequence similarity proved to be a putative 
chymotrypsin-like (3C-like) proteinase domain (3CL?"°), 
flanked by hydrophobic (probably membrane-spanning) do- 
mains. 


1416 ZIEBUHR ET AL. 


In vertebrate nidoviruses, the 3CLP*° cleaves the viral 
polyproteins at multiple conserved sites and is responsible for 
posttranslational release of the key replicative proteins. It has 
therefore also been referred to as the main proteinase (MP) 
to distinguish it from accessory nidovirus proteinases, which 
cleave at only a few sites in the N-terminal ppla/pplab regions 
(51). Although no 3CLP"® cleavage sites could be readily pre- 
dicted in the ppla/pplab polyproteins of this invertebrate 
nidovirus, it seems likely that this GAV proteinase may have a 
similar critical role in viral replication, as has been demon- 
strated conclusively for its vertebrate nidovirus homologs (8, 
51). Based on sequence comparisons, it has been proposed that 
the GAV 3CL?”® is distantly related to the main proteinases of 
arteri- and coronaviruses as well as the NIa proteinases of 
plant potyviruses, which all have an (E,Q) | (G,S,A) substrate 
specificity (8). Throughout this article, amino acid residues 
flanking the scissile bond (indicated by | ) are given from N to 
C terminus in the single-letter code, where x indicates any 
residue. If various residues are found at a given position, these 
are listed in parentheses. 

In this report, we provide direct evidence for the predicted 
proteolytic function of GAV 3CL?"°. Predictions of putative 
active-site residues identified by sequence comparisons were 
substantiated by site-directed mutagenesis, and information on 
the GAV 3CLP” substrate specificity was obtained. The theo- 
retical and experimental data presented in this study define a 
new member of the constantly growing group of viral 3C-like 
proteinases, which may combine the Cys-His catalytic dyad of 
the main proteinase of coronaviruses with a potyvirus-like sub- 
strate-binding pocket. 


MATERIALS AND METHODS 


Expression of GAV ppla/pplab sequences. The cDNA clones pGCLP7.6 and 
pGAV1b-3’-9 (J. A. Cowley, unpublished data) were used as templates for PCR 
amplification of GAV sequences. The sequences of pGCLP7.6 and pGAV1b- 
3’-9 deviated at several positions from the sequence reported previously (Gen- 
Bank accession number AF227196), which was derived from multiple random 
reverse transcription-PCR products generated from total RNA isolated from 
pooled lymphoid organs of GAV-infected P. monodon (8). The ppla/pplab 
sequence used in this study contained nucleotide changes that led to four amino 
acid substitutions: Cys*°”*Arg, Ala*!!°Thr, Ser?!?’Leu, and His®*!Tyr. 

DNA sequences encoding different GAV ppla/pplab regions were amplified 
by PCR with the primers listed in Table 1. The PCR products were treated with 
T4 DNA polymerase, phosphorylated with T4 polynucleotide kinase, digested 
with EcoRI, and inserted into the XmnI and EcoRI sites of pMal-c2 (New 
England Biolabs, Frankfurt, Germany). The resulting plasmids, which are shown 
in Table 1, allowed the expression of GAV ppla/pplab sequences fused to the 
maltose-binding protein (MBP) of Escherichia coli (Fig. 1). Site-directed mu- 
tagenesis was done by a recombination-PCR method (19, 47). E. coli TB1 cells 
transformed with the appropriate pMal-c2 derivatives (Table 1) were grown at 
37°C in Luria-Bertani (LB) medium containing 100 yg of ampicillin per ml until 
they reached a culture density (A595) of 0.6. Expression of the recombinant 
proteins was induced by addition of 0.5 mM isopropyl-B-p-thiogalactopyranoside 
(IPTG) for 3 h at 24°C. For analysis of recombinant protein expression, aliquots 
of the cell cultures were suspended in 2X Laemmli sample buffer and heated at 
94°C for 3 min, and the lysates were analyzed by electrophoresis in sodium 
dodecyl sulfate (SDS)-polyacrylamide gels and Western immunoblotting with 
standard protocols. 

Proteins 2832-3126, 2832-3126_C*°°SA, MBP-2948-3143, and MBP-6338-6673 
(Table 1) were purified by amylose affinity chromatography as described previ- 
ously (17, 50). Two of the proteins, 2832-3126 and 2832-3126_C*°°8A, were 
purified further. To this end, the affinity-purified fusion proteins were subjected 
to cleavage by factor Xa (Amersham Biosciences, Freiburg, Germany) and 
loaded onto phenyl-Sepharose HP columns (Amersham Biosciences) that had 
been preequilibrated with buffer containing 20 mM Tris-HCl (pH 7.5), 600 mM 
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NaCl, 1 mM dithiothreitol, and 0.1 mM EDTA. The GAV-specific proteins were 
eluted with 20 mM Tris-HCl (pH 7.5)-1 mM dithiothreitol-O0.1 mM EDTA, 
concentrated (Centricon-3; Millipore), and loaded onto a Superdex 75 column 
(Amersham Biosciences), which was run under isocratic conditions with 20 mM 
Tris-HCI (pH 7.5)-150 mM NaCl-1 mM dithiothreitol-0.1 mM EDTA. The 
purified proteins were concentrated to 5 mg/ml (Centricon-3) and stored at 
—80°C. 

N-terminal protein sequence analysis. Following SDS-polyacrylamide gel elec- 
trophoresis (PAGE), the proteins were transferred to polyvinylidene difluoride 
membranes (162-0180; Bio-Rad Laboratories, Munich, Germany) and subse- 
quently stained with Coomassie brilliant blue. The membrane regions containing 
the proteins of interest were isolated as described previously (49), and the 
proteins were subjected to six cycles of Edman degradation by use of a pulsed- 
liquid protein sequencer (ABI 467A; Applied Biosystems, Weiterstadt, Germa- 
ny). 

Preparation of antiserum a-MBP-2948-3143. The MBP-2948-3143 fusion pro- 
tein was purified by amylose affinity chromatography from TB1[pMal-GAV- 
2948-3143] cells as described above. The protein was cleaved with factor Xa 
(Amersham Biosciences) and used to immunize rabbits as described previously 
(49). The antiserum was designated a-MBP-2948-3143. 

trans-cleavage assay. Typical 20-1 reaction mixes contained recombinant 
GAV 3CL?"° (2832-3126 or 2832-3126_C?°°8A) and the substrate protein, MBP- 
6638-6673 (each at 1.6 4M), in a buffer containing 20 mM Tris-HCl (pH 7.5), 200 
mM NaCl, 1 mM EDTA, and 1 mM dithiothreitol. Following incubation at 22°C 
for 16 h, the reaction products were separated on SDS-15% polyacrylamide gels 
that were stained with Coomassie brilliant blue R-250. 

Computer-aided comparative sequence analyses. Amino acid sequences were 
derived from the Genpeptides database. 3CL?"® sequence alignments were pro- 
duced with the Clustal X program (42) and the Blossum series of scoring inter- 
residue tables (18). The virus interfamily alignments were generated in the 
profile mode. The alignments obtained were used in the PhD program (34, 35) 
to predict secondary structures and also to build profiles with the Profileweight 
program (43). These profiles were compared in pairs with the Proplot program 
(43). Two profiles, where one profile may be a sequence, were compared by 
sliding a window of the selected length along each possible register for a given 
dot plot. Several window lengths were tested. Matches between two profiles that 
were within the top 0.05% or between the top 0.1% and 0.05% were marked by 
two different types of dots. 


RESULTS 


pro 


Comparative sequence analysis of GAV 3CL?”® with chymo- 
trypsin-like cysteine proteinases of positive-stranded RNA vi- 
ruses. We first sought to refine the previously published se- 
quence comparison of the putative GAV proteinase (8) to 
provide a theoretical basis of sufficient reliability for subse- 
quent experimental studies. Specifically, we tried to gain initial 
insight into the substrate specificity and possible active-site 
residues. 

Comparison of the entire replicase gene revealed that, 
among all viruses sequenced to date, the Coronaviridae repre- 
sent the most closely related family to GAV (unpublished 
data). In the case of the 3CL?"°, however, the most significant 
matches were found in homologs from the Potyviridae family 
(8) (data not shown). Comparison of the GAV 3CLP" with 
both corona- and potyvirus 3CLP"°s revealed conservation of 
two regions: (i) the segment containing the catalytic His resi- 
due, which is most similar between the GAV and coronavirus 
3CLP"s, and (ii) the segment containing the catalytic Cys res- 
idue, which is most similar between the GAV and potyvirus 
3CLP's (Fig. 2). No conservation was evident in the segment 
between the catalytic His and Cys residues, which contains the 
catalytic Asp residue of potyvirus (and many other) 3C-like 
proteinases. 

To dissect this region further, the GAV 3CLP"® was com- 
pared with a combined and structurally corrected (2) align- 
ment of corona- and potyvirus 3CL?"s (17) with the global 
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TABLE 1. Oligonucleotides used for the amplification or mutagenesis of GAV sequences 


Oligonucleotides used for cloning or mutagenesis (5' — 3')* Plasmid? Eula euaeten 
AACGCATATGCCCAGGCAATCGATTC pMal-GAV-2793-3143 2793-3143 
AAAGAATTCTTAGCAACGGAATCTGGTGAGAGGA 
AACGCATATGCCCAGGCAATCGATTC pMal-GAV-2793-3059 2793-3059 
AAAGAATTICTTACTGATAGTTGGTGGGGAGCTTTGGTGTTG 
AACGCATATGCCCAGGCAATCGATTC pMal-GAV-2793-3028 2793-3028 
AAAGAATTCTTAGACGGGCCAGACCTTTGGTGGATCGAC 
ATCAGGCTCGGCTCAATGTCCACT pMal-GAV-2948-3143 2948-3143 
AAAGAATTCTTAGCAACGGAATCTGGTGAGAGGA 
GTTCGTACAGGTAACGCCACCACGGTC pMal-GAV-2832-3126 2832-3126 
AAAGAATICTTAGTTGCTGAGTGGAGAAAGGTCAGCAATA 
AGGATGGTGATGCIGGTTCCATCATCTTCGACCACC pMal-GAV-2832-3126_C?°°8A 2832-3126 Cys?°°8 — Ala 
ATGATGGAACCAGCATCACCATCCTTGGTGGAGATG 
GTTCGTACAGGTAACGCCACCACGGTC pMal-GAV-2832-3059 2832-3059 
AAAGAATICTTACTGATAGTTGGTGGGGAGCTTTGGTGTTG 
GTTCGTACAGGTAACGCCACCACGGTC pMal-GAV-2832-3028 2832-3028 
AAAGAATTICTTAGACGGGCCAGACCTTTGGTGGATCGAC 
AACACTAACAATTGGGAACAAATAC pMal-GAV-6338-6673 6338-66734 
AAAGAATTCTTAAAATTTGATGAATCTGGGAGAT 
CACTTCCCTCGACGCATCTTCGACACCTGCACTGACA pMal-GAV-2793-3143_H*8?R 2793-3143 His*8”? — Arg 
TGTCGAAGATGCGTCGAGGGAAGTGGAGGGATTTGC 
CACTTCCCTCGACTCATCTTCGACACCTGCACTGACA pMal-GAV-2793-3143_H*8?L 2793-3143 His?8”? — Leu 
TGTCGAAGATGAGTCGAGGGAAGTGGAGGGATTTGC 
AGGATGGTGATGCTIGGTTCCATCATCTTCGACCACC pMal-GAV-2793-3143_C?°°8A 2793-3143 Cys?°°8 — Ala 
ATGATGGAACCAGCATCACCATCCTTGGTGGAGATG 
AGGATGGTGATICTGGTTCCATCATCTTCGACCACC pMal-GAV-2793-3143_C7°°8S 2793-3143 Cys?°8 — Ser 
ATGATGGAACCAGAATCACCATCCTTGGTGGAGATG 
TGAGTGAAGAATATGCTIGCTACACCATTCATCAAAGTTG pMal-GAV-2793-3143_ D? 7A 2793-3143 Asp”??? — Ala 
GAATGGTGTAGCAGCATATTCTTCACTCAAAAGCTCGATG 
TGAGTGAAGAATATGAGGCTACACCATTCATCAAAGTTG pMal-GAV-2793-3143_D??!"E 2793-3143 Asp??? > Glu 
GAATGGTGTAGCCICATATTCTTCACTCAAAAGCTCGATG 
TGAGTGAAGAATATCAAGCTACACCATTCATCAAAGTTG pMal-GAV-2793-3143_D*??Q 2793-3143 Asp??!? > Gin 
GAATGGTGTAGCTIGATATTCTTCACTCAAAAGCTCGATG 
CGTCGGTGCCGCTATCGTCGGTATCTCCTGCATCCCT pMal-GAV-2793-3143_H?°3A 2793-3143 His??? — Ala 
TACCGACGATAGCGGCACCGACGACATTACCGAGGTG 
CGTCGGTGCCITTATCGTCGGTATCTCCTGCATCCCT pMal-GAV-2793-3143_ HF 2793-3143 His”’*> —> Phe 
TACCGACGATAAAGGCACCGACGACATTACCGAGGTG 
TCGTCGGTATCGCCTGCATCCCTCCAGTCAACGGTG pMal-GAV-2793-3143_S?°88A 2793-3143 Ser?88 —> Ala 
GGAGGGATGCAGGCGATACCGACGATATGGGCACCGA 
TCGTCGGTATCCACTGCATCCCTCCAGTCAACGGTG pMal-GAV-2793-3143_S7°8H 2793-3143 Ser?°88 —> His 


GGAGGGATGCAGTGGATACCGACGATATGGGCACCGA 


“ Underlined residues in the oligonucleotide sequence indicate mutant codons. 


» GAV sequences were inserted into the unique XmnI and EcoRI restriction sites of pMal-c2 plasmid DNA (New England Biolabs). 
© The GAV ppla/pplab residues given were expressed as fusions with E. coli MBP. The amino acid residues are numbered according to the sequence published by 


Cowley et al. (8) (GenBank accession no. AF227196). 


¢ Amino acid numbering of the ORF1b-encoded portion of pplab is based on the prediction that —1 ribosomal frameshifting occurs at the sequence 


12215 A AAUUUU!222! (8). 
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FIG. 1. Expression of GAV replicase gene. The ~20,000-nucleo- 
tides gene comprises ORFs 1a and 1b, which occupy the 5’-terminal 
region of the GAV genome and encode two replicase polyproteins, 
ppla and pplab. Expression of pplab requires a —1 frameshift during 
translation, which is predicted to be mediated by a slippery heptanucle- 
otide sequence and an RNA pseudoknot structure (8). The primary 
GAV ppla/pplab-derived protein constructs used in this study are 
shown schematically. The N- and C-terminal residues of the GAV- 
specific amino acid sequences are given in the one-letter code. The 
numbering of ppla/pplab amino acids is based on predictions on the 
GAV frameshift site, AAAUUUU (nucleotides 12215 to 12221 of the 
GAV genome) (8) (GenBank accession number AF227196). Fusions 
of GAV ppla/pplab amino acids with E. coli MBP are indicated. Also, 
the positions of putative active-site Cys and His residues and the GAV 
3CL?"® cleavage sites characterized in this study are given (C, H, and 
EV, EJL, respectively). 


alignment tool Clustal X (42). In this study, Ala”?'? was iden- 
tified as a plausible candidate to occupy the main chain posi- 
tion equivalent to that of the catalytic Asp residue of potyvirus 
3CLP°s, suggesting that GAV 3CLP*°, like coronavirus 
3CLP°s (2, 17), may lack a catalytic acidic residue in this 
region (Fig. 2). 

The computer-aided analysis of putative substrate-binding 
residues of 3CL?’® produced a low-resolution model. GAV 
His*’*?, the previously proposed counterpart to the key S1 
subsite His residues of other 3C/3CL?"°s (8), was either at the 
edge or even outside of a stretch of matching residues in the 
GAV-versus-potyvirus and GAV-versus-coronavirus dot plots, 
respectively (Fig. 2). The low similarity in this region is due to 
the unusually short size of this segment in GAV 3CLP" and 
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unique amino acid replacements in the immediate vicinity of 
GAV His**® and the corresponding His residues in coronavi- 
rus 3CLP"s (15, 17) (Fig. 3). Accordingly, when the GAV 
3CLP*° was compared separately with each of the two protein- 
ase groups with Clustal X, another closely located residue of 
GAV, Ser??®®, was aligned with the substrate-binding His (not 
shown). Five residues upstream of the catalytic Cys, a Thr/Ser 
residue which, in many 3C/3CL?"°s, together with His, makes 
contact with the substrate’s P1 Gln/Glu side chain (3, 16, 28— 
30), was found to be conserved in the GAV sequence (GAV 
Thr?’**), suggesting that His (rather than Ser) is the most 
probable candidate to assume the key position in the S1 sub- 
site. 

Apart from Thr?’°?, three other residues (His*”*’, Ile*°®, 
and Gly?”*') located nearby were revealed to be conserved 
among GAV and potyvirus but not coronavirus 3CL?"°s (Fig. 
3). Based on the available 3C/CL’™ structure information (2, 
4, 29, 30), these residues are likely to be part of the extended 
substrate-binding pocket. The observed sequence conservation 
suggested that the well-defined substrate specificity of potyvi- 
rus 3CLP*°s (21) may, at least in part, be shared by the GAV 
enzyme. 

Nidovirus 3CLP"°s comprise two catalytic B-barrels and an 
extra C-terminal domain. In the viral polyprotein, they are 
flanked by well-conserved cleavage sites that are used to re- 
lease the proteinase from adjacent transmembrane domains 
(15, 51). A similar domain organization was unraveled in 
GAV, although the sequence conservation was rather low, 
especially outside the catalytic domains (Fig. 3). In striking 
contrast to other nidoviruses, we were unable to identify con- 
servation in the immediate flanking regions of 3CLP"® or, at 
least, dipeptides conforming to canonical 3CLP"® cleavage sites 
[(Glu,GIn) | (Ser,Ala,Gly)], indicating that the GAV 3CL"° 
may have a deviant specificity and release itself from the pre- 
cursor in a unique fashion. 

Proteolytic activity of GAV 3CL?"° domain. To address the 
predicted proteolytic activity of the GAV 3CL?"°, ppla/pplab 
residues 2793 to 3143 (containing the presumed 3CL?"° and a 
short N-terminal flanking region) were expressed as part of an 
MBP fusion protein (MBP-2793-3143) in E. coli. Based on 
studies on the related human coronavirus 3CLP" (49), the 
N-terminal region was expected to contain a 3CLP" site that 
could be autoprocessed in E. coli. As Fig. 4A (lanes 2 and 3) 
shows, induction of expression resulted in the synthesis of two 
proteins of ~47 and ~38 kDa that were not detectable in the 
noninduced control, suggesting proteolytic cleavage of the pri- 
mary translation product, for which a molecular mass of 82 
kDa was calculated. The fact that the control protein, MBP- 
2793-3143 H8”R, in which Arg replaced the putative active- 
site His*®” residue, gave rise to the full-length protein (Fig. 
4A, lanes 4 and 5) provided conclusive evidence that, as pre- 
dicted, GAV pp1a/pplab residues 2793 to 3143 contain a func- 
tional proteinase domain. 

To identify the N- and C-terminal portions of the cleaved 
protein, the lysate obtained from IPTG-induced E. coli 
TB1[pMal-GAV-2793-3143] cells was analyzed by Western 
blotting with specific antiserum. The data presented in Fig. 4B 
revealed that the 47-kDa protein was the N-terminal (that is, 
MBP-containing) cleavage product and that the 38-kDa pro- 
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FIG. 2. Profile-versus-profile dot plot cross-comparisons of GAV 3CL?'° with coronavirus and potyvirus 3CL?"°s. Alignments of coronavirus 
and potyvirus 3C-like proteinases were converted into profiles and compared in a dot plot fashion, as described in Materials and Methods. Shown 
are the dot plots generated with a window of 35 amino acid residues. The projected positions of the catalytic residues (H*°/H*! versus H78”, D®!, 
ch!/c'“ versus C*°°’), as well as the substrate-binding H'°’/H'® residues versus H?°**, are shown at each axis. Putative catalytic residues are 
designated by asterisks. Those dots, which lay at any of the four possible crosses of projections of two functionally equivalent residues (e.g., H*° 
and H**’°) or close to a nonvisible diagonal passing these crosses, belong or may belong to the true matches between two profiles. The rest of the 


dots are background hits (false-positives). 


tein was the C-terminal cleavage product containing the GAV 
ppla/pplab sequence 2948 to 3143. 

trans-cleavage activity of recombinant GAV 3CL?"’. From 
the data presented above, it could not be concluded whether 
the N-terminal 3CL?™ cleavage had occurred in cis or was 
mediated by trans-acting precursors. Although the high cleav- 
age efficiency indicated by the virtual absence of detectable 
precursors strongly suggested a cotranslational monomolecu- 
lar reaction, we expected that the recombinant 3CLP*° might 
also have trans-cleavage activity required by the native protein- 
ase to process the full spectrum of cleavage sites assumed to 
exist in the 460-kDa and 758-kDa GAV replicase polyproteins. 
The demonstration of such trans-cleavage activity would also 
formally exclude the involvement of EF. coli proteinases in the 
processing described in Fig. 4. 

trans-cleavage activity was examined with purified, recombi- 
nant 3CLP"® (for details, see Materials and Methods). Because 
of the uncertainty regarding the C-terminal border of 3CLP"° 
(see below), we initially tested bacterially expressed proteins 
with C termini of different lengths (2832 to 3143 and 2832 to 
3126). Both proteins had proteolytic activity. We decided to 
use 2832-3126 in subsequent trans-cleavage experiments be- 
cause of its superior stability. As a control, a protein with the 
same sequence but containing a substitution of the putative 


nucleophilic active-site Cys*°°* residue (2832-3126 _C*°°8A) 
was produced (Fig. 5). The purified proteins were incubated 
with bacterially expressed MBP-6338-6673 containing the C- 
terminal GAV pplab sequence corresponding to the corona- 
virus pplab region with the most C-terminal 3CLP*® cleavage 
site (20, 25, 51). The data (Fig. 5) revealed that the wild-type 
proteinase but not the active-site mutant was active in trans, 
proving that GAV 3CL?" is indeed a proteinase. 

Substrate specificity of GAV 3CL?. To obtain information 
on 3CL?P"°’s substrate specificity, the structure of two cleavage 
sites was determined with mono- and bimolecular cleavage 
reactions. First, we determined the N-terminal sequence of the 
38-kDa C-terminal processing product of the MBP-2793-3143 
fusion protein precursor (Fig. 6). Proteins in the E. coli lysate 
analyzed in Fig. 4A (lane 3) were separated by SDS-PAGE, 
transferred electrophoretically to a polyvinylidene difluoride 
membrane, and stained with Coomassie brilliant blue, and the 
38-kDa protein was isolated and subjected to six cycles of 
Edman degradation. The data shown in Fig. 6 clearly indicated 
that cleavage occurred at the sequence **?”7LVTHE | 
VRTGN**°°, which identifies Val**°? as the N terminus of 
3CLP". The observed molecular mass of the 3CL?*°-contain- 
ing cleavage product (38 kDa) slightly surpassed that calcu- 
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FIG. 3. Multiple sequence alignment of GAV, coronavirus, and potyvirus 3CL?'° domains. The Clustal X-based alignment of corona- and 
potyvirus 3CL?*°s produced previously (17) was modified slightly to accommodate the results of the tertiary-structure analysis of a porcine 
coronavirus 3CL" (2) and used to align the GAV 3CL?"° sequence. For GAV and coronaviruses, this alignment was further expanded by including 
upstream and downstream sequences with Clustal X. Shown are the regions enriched in hydrophobic amino acid residues and flanking the 3CL?"° 
from both the N terminus (C-terminal part of hydrophobic domain [HD3]) and the C terminus (entire HD4). These hydrophobic domains are 
conserved in all nidoviruses (14). For GAV and coronaviruses, the ppla/lab amino acid positions are given on the right; for potyviruses, the 
numbers refer to the amino acid positions in the 3CL’"°. The column conservation in the two groups of coronavirus/GAV versus potyvirus 
sequences was highlighted separately with different colors for the following groups of amino acids: green for G, A, L, I, V, M, F, Y, and W; blue 
for H, K, and R; red for N, Q, E, and D; yellow for P; and violet for S and T. Columns with conserved or identical residues in all sequences are 
indicated by colons and solid squares, respectively, in the line separating the coronavirus/GAV and potyvirus groups. Empty squares highlight 
columns with identical residues in the GAV and potyvirus sequences. #, conserved catalytic Cys and His residues; @, P1-binding His residue 
conserved in all sequences and Thr residue conserved among GAV and potyviruses; solid circle, catalytic Asp residue of potyviruses. ><, positions 
of cleavage sites separating 3CL’" from flanking domains in corona- and potyviruses. Abbreviations of virus names and DDBJ/EMBL/GenBank 
accession numbers for the sequences are as follows: HCoV, human coronavirus (strain 229E) (X69721); TGEV, transmissible gastroenteritis virus 
(strain Purdue 115) (Z34093); PEDV, porcine epidemic diarrhea virus (strain CV777) (NC_003436); MHVA, murine hepatitis virus (strain A5S9) 
(NC_001846); BCoVI, bovine coronavirus (isolate LUN) (AF391542); IBV, avian infectious bronchitis virus (strain Beaudette) (M95169); TVMV, 
tobacco vein mottling virus (P09814); TUMVQ, turnip mosaic virus (strain Quebec) (Q02597); TEV, tobacco etch virus (P04517); PVY, potato 
virus Y (strain N) (P18247); PSBMV, pea seed-borne mosaic virus (strain DPD1) (P29152); PPVRA, plum pox virus (strain Rankovic) (P17767); 
PRSVH, papaya ringspot virus (strain P/mutant HA) (Q01901); PEMVC, pepper mottle virus (California isolate) (Q01500); BSMRV, Brome 


streak mosaic rymovirus (strain 11-Cal) (Q65730). 


lated for this peptide sequence (34.8 kDa), making a second, 
C-terminal cleavage of MBP-2793-3143 unlikely. 

Second, we conducted a similar N-terminal sequence anal- 
ysis (data not shown) of the ~27-kDa C-terminal cleavage 
product from the trans-cleavage reaction documented in Fig. 5. 
This analysis unambiguously identified the scissile bond as 
oMTKVNHE | LYHVA%**°. As no other processing product 
was detected, it is reasonable to assume that the C-terminal 
processing product of GAV pplab is a 27-kDa protein encom- 
passing amino acids 6446 to 6673. The data provided addi- 
tional information on the GAV 3CLP™ substrate specificity, 
which allows us to preliminarily propose VxHE | (L,V) as the 
consensus sequence of GAV 3CL?*° cleavage sites. Although 
the picture is still incomplete, our data indicate that the sub- 
strate specificity of the GAV 3CLP"° is well defined, as in 
vertebrate nidovirus main proteinases and many of their viral 
relatives, but differs from that of typical 3C/3C-like enzymes. 

Dispensability of C-terminal sequences for 3CL?° autopro- 
cessing activity. The observed preference for substrates con- 
taining HEL or HEV tripeptides lends additional support to 
our hypothesis that there is no cleavage site between the 
3CLP" domain and the downstream putative membrane-span- 


ning domain. It is thus tempting to speculate that, in contrast 
to the main proteinases of vertebrate nidoviruses, the GAV 
3CLP" is the N-terminal component of a larger protein. To 
determine whether the sequences downstream of the predicted 
two--barrel domain are essential for 3CLP"® cleavage activity, 
we compared the proteolytic activities of two C-terminal MBP- 
2793-3143 deletion mutants with that of the parental protein. 
As Fig. 7 shows, the two C-terminally truncated proteins had 
reduced but clearly detectable proteolytic activities, suggesting 
that the N-terminal region from 1 to 197 contains all the 
structural elements and residues required for substrate binding 
and catalysis. Furthermore, comigration of the processed N- 
terminal product (Fig. 7) suggests that, in all three proteins 
with proteolytic activity, cleavage occurred at the same peptide 
bond. 

Active center of GAV 3CL?"®. In a final set of experiments, 
the predictions of possible active-site residues (8) (Fig. 3) were 
tested by site-directed mutagenesis. The MBP-2793-3143 
protein encoded by the parental plasmid construct pMal- 
GAV-2793-3143 (Table 1, Fig. 1) and characterized in the 
experiments shown in Fig. 4 was used as a positive control. 
Single-amino-acid substitutions were introduced into this con- 
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FIG. 4. Proteolytic activity of GAV ppla/pplab amino acids 2793 
to 3143. (A) Total cell lysates from E. coli TB1 cells transformed with 
pMal-GAV-2793-3143 (lanes 2 and 3, WT) and pMal-GAV-2793- 
3143_H?*”R (lanes 4 and 5, H*8”’R) were separated by SDS-PAGE in 
a 12.5% polyacrylamide gel and stained with Coomassie brilliant blue 
R-250. The bacteria were mock induced (lanes 2 and 4) or induced 
with 1 mM IPTG for 3 h (lanes 3 and 5). The positions of the fusion 
proteins and cleavage products are indicated, and the molecular 
masses of marker proteins (lane 1) are given (in kilodaltons). (B) The 
protein lysate shown in panel A (lane 3) was separated by SDS-PAGE 
in a 10% polyacrylamide gel, transferred to a nitrocellulose membrane, 
and immunostained with MBP-2948-3143-specific rabbit antiserum 
(lane 1) or MBP-specific antiserum (New England Biolabs) (lane 2). 
The positions of the N-terminal (i.e., MBP-containing) and C-terminal 
cleavage products are indicated, and the positions of marker proteins 
are given (with masses in kilodaltons). 
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FIG. 5. trans-cleavage activity of GAV 3CL?°. Recombinant GAV 
3CL?° encompassing 295 amino acids (2832 to 3126) and an active-site 
mutant (2832-3126 _C?°°8A) were bacterially expressed, purified, and 
incubated with an MBP fusion protein substrate, MBP-6338-6673, 
containing the C-terminal GAV pplab sequence (see Materials and 
Methods for details). Lanes: 1, marker proteins, with molecular masses 
indicated in kilodaltons; 2, MBP-6338-6673 incubated with buffer; 3, 
MBP-6338-6673 incubated with 2832-3126; 4, 2832-3126 incubated 
with buffer; 5, MBP-6338-6673 incubated with buffer; 6, MBP-6338- 
6673 incubated with 2832-3126_C?°°8A; 7, 2832-3126_C?°°8A incu- 
bated with buffer. Cleavage products of MBP-6338-6673 are indicated 
by arrowheads. 


struct, and their effects were studied by analyzing the autopro- 
cessing activities of the MBP-2793-3143 mutants. The data 
shown in Fig. 8 revealed that replacements of the predicted 
catalytic His*8”° (by Arg and Leu) and Cys??°* (by Ala and Ser) 
residues completely abolished proteolytic activity, supporting 
the proposed catalytic function of these residues. In contrast, 
all the Asp”?'? mutants (D*?!7A, D??!7E, and D??!7Q) retained 
their activities in the assay used. This result is consistent with 
our sequence comparison data, which also contradicted a cat- 
alytic function of this residue (see Fig. 3). 

Mutagenesis of His*?** resulted in proteolytically inactive 
proteins, whereas the Ser??** mutants retained wild-type activ- 
ity. These data make His the most probable candidate for 
the key position in the S1 subsite of the 3CL?" substrate- 
binding pocket. We speculate that His?°** may cooperate with 
a threonine residue (Thr?”°*) that, as in many other 3C/3C-like 
proteinases (3, 16, 29, 30), is located 5 residues upstream of the 
presumed GAV 3CL?"° principal nucleophile (Cys””°*) and, 
together with the imidazole side chain of histidine, may contact 
the P1 side chain of the substrate. The results thus fully sup- 
port our predictions on GAV 3CL?*° putative active-site resi- 
dues (see above and Fig. 3). 


2983 


DISCUSSION 


GAV is the first invertebrate nidovirus to be characterized at 
the molecular level. It infects black tiger prawns and represents 
the prototype of newly established taxa, genus Okavirus, family 
Roniviridae, within the order Nidovirales (8, 11). In this study, 
the viral main proteinase, a 3C-like cysteine proteinase, was 
characterized. Despite the wealth of information available for 
diverse 3CLP"s, predictions of the key features of the GAV 
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FIG. 6. Characterization of N-terminal GAV 3CL?’° autoprocessing site by protein sequencing. The C-terminal MBP-2793-3143 cleavage 
product (Fig. 4A, lane 3) was subjected to Edman degradation, and phenylthiohydantoin (PTH)-amino acids generated during each reaction cycle 
were detected by their absorbance at 269 nm (expressed as milliabsorption units) and identified by their characteristic retention times on a 
reversed-phase high-pressure liquid chromatography support. (A) Chromatogram of PTH-amino acid standards. (B to F) Chromatograms of 
PTH-amino acids from reaction cycles 1 to 5. Specific peaks of PTH-amino acids are indicated by the single-letter code. 


3CLP™ proved to be challenging because of the unique phylo- 
genetic position of this invertebrate nidovirus. Nevertheless, 
we were able to produce a coherent picture with a combination 
of bioinformatics and biochemical and genetic methods. 

Previous studies of coronavirus 3CLP"°s suggested that an- 
cestors of these enzymes accepted unprecedented substitutions 
in most of the conserved positions of the catalytic system and 
the substrate pocket, making this group of enzymes an outlier 
among the huge family of viral and cellular chymotrypsin-like 
homologs (2, 15, 17). We now provide evidence that GAV 
3CLP™ provides an evolutionary link between the 3CL?"°s of 
coronaviruses and (all the) other positive-stranded RNA vi- 
ruses. Specifically, our data indicate that the unique replace- 
ments in coronavirus 3CLP*°s of otherwise strictly conserved 
residues must have been acquired gradually in the nidovirus 
lineage. In this context, the GAV 3CL?"® seems to emerge as 
an important model to study (separately) the functional effects 
of the (abridged) Cys-His catalytic system. This is possible 
because, in contrast to coronavirus 3CL?’°s, which feature both 
a Cys-His catalytic center and a noncanonical substrate pocket, 
the GAV 3CL?*° Cys-His catalytic center seems to be com- 
bined with a canonical (potyvirus-like) substrate pocket (see 
below and Fig. 9). 

Catalytic system of 3CL?"® Sequence comparisons revealed 
that the GAV 3CLP" has very little similarity to other RNA 
viral 3C-like proteinases (Fig. 2) (8). Even with the closest 
known relatives, potyvirus NIa and coronavirus main protein- 
ases, similarities to the GAV 3CL?" are restricted essentially 
to the regions containing the putative Cys and His active-site 
residues (Fig. 2), which made sequence alignments in other 


regions less robust. Our experimental evidence strongly sug- 
gests that 3CLP*° employs a catalytic dyad composed of Cys*?°* 
and His*8”°. The mutagenesis data did not corroborate earlier 
predictions of a third catalytic residue (Asp~?'”) (8). Instead, 
the acidic residue appears to be replaced in GAV by the 
neutral Ala??!' residue (Fig. 3). 

It should be noted that an equivalent of the Asp residue of 
the chymotrypsin catalytic triad is also missing in coronavirus 
3CLP*°s (2, 17, 24, 50). Also, in the crystal structure of the 
hepatitis A virus 3C proteinase, the side chain of the conserved 
Asp residue adopts an unexpected orientation (1, 4). Even 
though the hepatitis A virus Asp* residue occupies the ex- 
pected position in the main chain, it forms a salt bridge with 
the € amino group of a Lys side chain from strand fII (4) rather 
than interacting with the catalytic His**, and thus, a catalytic 
function is unlikely. Apparently, in an appropriate environ- 
ment, the relatively low pK, of the Cys nucleophile (compared 
to that of Ser) may fully or partially relieve some 3C/3C-like 
cysteine proteinases from dependence on an Asp (Glu) car- 
boxylate group, which is usually required to stabilize the de- 
veloping positive charge on the catalytic histidine residue dur- 
ing serine proteinase catalysis (13, 23, 27). 

Substrate specificity. In this study, initial information on the 
substrate specificity of the GAV 3CLP" was obtained by de- 
termining the N-terminal 3CL?"° autoprocessing site and a 
second 3CLP"° cleavage site in the C-terminal region of pp1ab. 
The sequences flanking the scissile bonds, *8?7LVTHE | 
VRTGN7*°° and =°4'KVNHE | LYHVA°**°, share the 
VXxHE | (L,V) motif. Inspection of coronavirus/GAV replicase 
alignments (A. E. Gorbalenya and J. Ziebuhr, unpublished 
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FIG. 7. Effect of C-terminal deletions on the self-processing activ- 
ity of MBP-2793-3143. (A) Total cell lysates from E. coli TB1 cells 
transformed with pMal-GAV-2793-3143 (lanes 1 and 2; 2793-3143), 
pMal-GAV-2793-3143_C?°°8A (lanes 3 and 4; 2793-3143 _C?°°SA), 
pMal-GAV-2793-3028 (lanes 5 and 6; 2793-3028), and pMal-GAV- 
2793-3059 (lanes 7 and 8; 2793-3059) were separated by SDS-PAGE in 
a 12.5% polyacrylamide gel and stained with Coomassie brilliant blue 
R-250. The bacteria were mock induced (lanes 1, 3, 5, and 7) or 
induced with 1 mM IPTG for 3 h (lanes 2, 4, 6, and 8). The positions 
of the fusion proteins and cleavage products are indicated, and the 
molecular masses of marker proteins (lane M) are given (in kilodal- 
tons). (B) The cell lysates shown in panel A were separated by SDS- 
PAGE, transferred to a nitrocellulose membrane, and immunostained 
with anti-MBP antiserum (New England Biolabs). The positions of the 
uncleaved fusion proteins and the N-terminal (i.e., MBP-containing) 
cleavage products are indicated, and the positions of marker proteins 
are given (with masses in kilodaltons). 


data) leads us to believe that Val/Thr/Ser and Leu/Val/Ile/Gly/ 
Ser/Ala at the substrate P4 and P1’ positions, respectively, may 
be compatible with proteolysis by GAV 3CL?"°. This conser- 
vation pattern suggests that the P4, P2, P1, and P1’ positions 
are the major 3CL"® specificity determinants. The same posi- 
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FIG. 8. Mutational analysis of active center of GAV 3CL?™. 
(A) The proteolytic activities of bacterially expressed MBP-2793-3143 
proteins carrying substitutions of putative active-site residues were 
examined by SDS-PAGE of cell lysates obtained after IPTG-induced 
(3 h, 24°C) protein expression. The introduced amino acid substitu- 
tions and the positions of both uncleaved fusion proteins and cleavage 
products are indicated. The proteolytic activity of the wild-type MBP- 
2793-3143 (WT) (see also Fig. 4) served as a positive control. (B) The 
cell lysates shown in panel A were separated by SDS-PAGE, trans- 
ferred to a nitrocellulose membrane, and immunostained with anti- 
MBP antiserum (New England Biolabs). The positions of the un- 
cleaved fusion proteins and the N-terminal (that is, MBP-containing) 
cleavage products are indicated. Also shown are the positions of mo- 
lecular mass markers (with masses given in kilodaltons). 


tions are critical in corona- and potyvirus 3CL™ cleavage sites, 
which provides further support to combine the GAV, corona-, 
and potyvirus 3CLP*°s in a separate group. 

Whereas the presence of Glu (or Gln) at the P1 position is 
a typical feature of RNA virus 3C/3CL?” substrates (16, 36), 
the GAV 3CL?” preferences at the other conserved positions 
are less common and, taken together, give this proteinase a 
unique substrate specificity formula. Interestingly, some plant 
potyvirus NIa 3C-like proteinases (21, 33, 44, 48) share the P2 
His substrate specificity with the GAV 3CL?". It is also note- 
worthy that, unlike most other 3C/3C-like proteinases, GAV 
3CLP seems to possess a relatively large (hydrophobic) S1’ 
subsite, which would accommodate the branched side chains of 
valine and leucine. 

A striking parallel between GAV 3CL?’° and various well- 
characterized positive-stranded RNA virus homologs (3, 16, 
28-30) is the conservation of the pair of His/Thr residues in the 
S1 subsite. Our hypothesis that the corresponding GAV 
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CATALYTIC RESIDUES 


* * * 

Enterovirus (PV) H E T Cc G H 
Hepatovirus (HAV) HE (D) G Cc G H 
Nepovirus (TBRV) H E s c G L 
Potyvirus (PEMV) H D T ¢ GH 
Ronivirus (GAV) H Tr c GH 
Coronavirus (HCoV) H c Y &#H 
Arterivirus (EAV) H D T Ss G H 

# # # 


SUBSTRATE POCKET RESIDUES 


FIG. 9. Variations in catalytic and substrate-binding residues of 
RNA viral chymotrypsin-like proteinases. PV, poliovirus; HAV, hep- 
atitis A virus; TBRV, tomato black ring virus; PEMV, pepper mottle 
virus; HCoV, human coronavirus; EAV, equine arteritis virus. The key 
catalytic (*) and substrate-binding pocket (#) residues are indicated. 
The catalytic Asp residue of hepatitis A virus is shown in brackets 
because its side chain orientation in the hepatitis A virus 3C?"° crystal 
structure (1, 4) argues against the proposed catalytic function (see text 
for details). 


3CLP"° residues (Thr?°°? and His?8*) may play an equivalent 
role is further supported by the local conservation of the cor- 
responding region among GAV and potyvirus 3CLP'°s (Fig. 2 
and 3) and our mutagenesis data (see above). Despite these 
similarities, it is likely that additional (poorly recognized) de- 
terminants may tune the P1 specificity in a virus-specific man- 
ner. Thus, for example, it is conceivable that the 3CLP™°s of 
GAV and arteriviruses, which both recognize a P1 Glu (rather 
than Gln) side chain (38; this paper), have similarly organized 
S1 subsites. 

Cleavage at C terminus of 3CL?*®’ RNA virus (including 
vertebrate nidovirus) 3C/3CLP"°s are commonly released from 
the replicase polyproteins by autocatalytic processing. In some 
cases, the N- and C-terminal sites are cleaved with different 
kinetics. Thus, for example, C-terminal 3C/3CLP"® cleavage 
occurs more slowly (picornaviruses) (36), is tightly regulated 
(arteriviruses) (45), or is totally lacking (some caliciviruses) 
(39, 46). In our experiments, no evidence was obtained for 
cleavage in the region immediately downstream of the GAV 
3CLP*° which, according to comparative sequence analysis 
(Fig. 3), also does not contain potential [that is, VxHE | (L,V)] 
cleavage sites. 

It is possible that a site immediately downstream of the 
proteinase domain might be cleaved by a cellular proteinase. 
However, this would be unprecedented based on data for other 
viral 3CLP*°s. Alternatively, domains from other regions of the 
viral polyprotein, which are missing in our constructs, might 
assist in autoprocessing at a C-terminal 3CL?™ site with a 
deviant structure. For instance, studies of the arterivirus 
equine arteritis virus have revealed that the C-terminal release 
of the nsp4 proteinase from the nsp4-8 precursor requires nsp2 
as a cofactor (45). Further studies with larger GAV 3CL?*°- 
containing precursor proteins and alternative expression sys- 
tems, including insect cells and primary crustacean cells (31), 
may help to address this question more rigorously. 
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If GAV 3CL?*° and the downstream hydrophobic domain 
are not separated by proteolytic cleavage, as our results sug- 
gest, then the proteinase would remain anchored to intracel- 
lular membranes throughout the replication cycle. To some 
extent, this association would resemble the situation in the 
arterivirus equine arteritis virus and the coronavirus mouse 
hepatitis virus, in which significant amounts of nsp4 and 
3CLP*°, respectively, are known to remain part of long-lived 
(or even stable) precursors which possess flanking hydrophobic 
domains on either one or both sides (22, 37, 45). 

Domain structure of 3CL?™’ In contrast to other 3C/ 
3CL?"°s, which consist of two catalytic B-barrel domains (1, 4, 
28-30), nidovirus and potyvirus 3CLP"°s possess an extra C- 
terminal domain of variable size (51). This additional domain 
is also present in the GAV 3CLP"°, although its precise size 
remains to be determined. In coronavirus 3CL?*°s, the C- 
terminal domain is involved in trans-cleavage activity (2, 26, 32, 
50). Recent crystal structure analysis of the transmissible gas- 
troenteritis virus 3CL?® showed that the domain adopts a 
unique a-helical structure that interacts with the enzyme’s N 
terminus. This interaction fixes the orientation of a loop region 
involved in substrate binding (2). 

The fact that the C-terminally truncated, 197-residue GAV 
3CLP° (Fig. 1 and 7) retained significant autoprocessing ac- 
tivity when expressed as an MBP fusion protein argues against 
an equally important role for the C-terminal domain of GAV 
3CL*°, at least in cis reactions. The effects of C-terminal 
deletions on the activity in trans remain to be determined. This 
experiment is of special interest because coronavirus 3CLP"°s 
have been shown to be differentially affected by C-terminal 
deletions in cis- versus trans-cleavage reactions (2, 26, 32, 50). 

Taken together, the differences and similarities revealed in 
this study between the main proteinase of a crustacean nidovi- 
rus and its viral homologs indicate a novel pattern of functional 
and structural conservation that has not been observed in any 
of the previously characterized proteinases from mammalian 
and plant pathogens. We are confident that, from an evolu- 
tionary perspective, the characterization of proteins of posi- 
tive-stranded RNA viruses isolated from less-characterized 
habitats will allow valuable insights into the evolution of vi- 
ruses and help identify both missing phylogenetic links and 
evolutionary forces operating in specific biological systems. 
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